tutorials/018 - QuickSight.ipynb (1,309 lines of code) (raw):

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[![AWS SDK for pandas](_static/logo.png \"AWS SDK for pandas\")](https://github.com/aws/aws-sdk-pandas)\n", "\n", "# 18 - QuickSight\n", "\n", "For this tutorial we will use the public AWS COVID-19 data lake.\n", "\n", "References:\n", "\n", "* [A public data lake for analysis of COVID-19 data](https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/)\n", "* [Exploring the public AWS COVID-19 data lake](https://aws.amazon.com/blogs/big-data/exploring-the-public-aws-covid-19-data-lake/)\n", "* [CloudFormation template](https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json)\n", "\n", "*Please, install the CloudFormation template above to have access to the public data lake.*\n", "\n", "*P.S. To be able to access the public data lake, you must allow explicitly QuickSight to access the related external bucket.*" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [], "source": [ "from time import sleep\n", "\n", "import awswrangler as wr" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>List users of QuickSight account<strong>" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/plain": [ "[{'username': 'dev', 'role': 'ADMIN'}]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[{\"username\": user[\"UserName\"], \"role\": user[\"Role\"]} for user in wr.quicksight.list_users(\"default\")]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Database</th>\n", " <th>Description</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>aws_sdk_pandas</td>\n", " <td>AWS SDK for pandas Test Arena - Glue Database</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>awswrangler_test</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>covid-19</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>default</td>\n", " <td>Default Hive database</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Database Description\n", "0 aws_sdk_pandas AWS SDK for pandas Test Arena - Glue Database\n", "1 awswrangler_test \n", "2 covid-19 \n", "3 default Default Hive database" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.databases()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Database</th>\n", " <th>Table</th>\n", " <th>Description</th>\n", " <th>Columns</th>\n", " <th>Partitions</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>covid-19</td>\n", " <td>alleninstitute_comprehend_medical</td>\n", " <td>Comprehend Medical results run against Allen I...</td>\n", " <td>paper_id, date, dx_name, test_name, procedure_...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>covid-19</td>\n", " <td>alleninstitute_metadata</td>\n", " <td>Metadata on papers pulled from the Allen Insti...</td>\n", " <td>cord_uid, sha, source_x, title, doi, pmcid, pu...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>covid-19</td>\n", " <td>country_codes</td>\n", " <td>Lookup table for country codes</td>\n", " <td>country, alpha-2 code, alpha-3 code, numeric c...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>covid-19</td>\n", " <td>county_populations</td>\n", " <td>Lookup table for population for each county ba...</td>\n", " <td>id, id2, county, state, population estimate 2018</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_edges</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, from, to, score</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_nodes_author</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, first, last, full_name</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_nodes_concept</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, entity, concept</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_nodes_institution</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, institution, country, settlement</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_nodes_paper</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, doi, sha_code, publish_time, source...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>covid-19</td>\n", " <td>covid_knowledge_graph_nodes_topic</td>\n", " <td>AWS Knowledge Graph for COVID-19 data</td>\n", " <td>id, label, topic, topic_num</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>covid-19</td>\n", " <td>covid_testing_states_daily</td>\n", " <td>USA total test daily trend by state. Sourced ...</td>\n", " <td>date, state, positive, negative, pending, hosp...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>covid-19</td>\n", " <td>covid_testing_us_daily</td>\n", " <td>USA total test daily trend. Sourced from covi...</td>\n", " <td>date, states, positive, negative, posneg, pend...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>covid-19</td>\n", " <td>covid_testing_us_total</td>\n", " <td>USA total tests. Sourced from covidtracking.c...</td>\n", " <td>positive, negative, posneg, hospitalized, deat...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>covid-19</td>\n", " <td>covidcast_data</td>\n", " <td>CMU Delphi's COVID-19 Surveillance Data</td>\n", " <td>data_source, signal, geo_type, time_value, geo...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>covid-19</td>\n", " <td>covidcast_metadata</td>\n", " <td>CMU Delphi's COVID-19 Surveillance Metadata</td>\n", " <td>data_source, signal, time_type, geo_type, min_...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>covid-19</td>\n", " <td>enigma_jhu</td>\n", " <td>Johns Hopkins University Consolidated data on ...</td>\n", " <td>fips, admin2, province_state, country_region, ...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>covid-19</td>\n", " <td>enigma_jhu_timeseries</td>\n", " <td>Johns Hopkins University data on COVID-19 case...</td>\n", " <td>uid, fips, iso2, iso3, code3, admin2, latitude...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>covid-19</td>\n", " <td>hospital_beds</td>\n", " <td>Data on hospital beds and their utilization in...</td>\n", " <td>objectid, hospital_name, hospital_type, hq_add...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>covid-19</td>\n", " <td>nytimes_counties</td>\n", " <td>Data on COVID-19 cases from NY Times at US cou...</td>\n", " <td>date, county, state, fips, cases, deaths</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>covid-19</td>\n", " <td>nytimes_states</td>\n", " <td>Data on COVID-19 cases from NY Times at US sta...</td>\n", " <td>date, state, fips, cases, deaths</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>covid-19</td>\n", " <td>prediction_models_county_predictions</td>\n", " <td>County-level Predictions Data. Sourced from Yu...</td>\n", " <td>countyfips, countyname, statename, severity_co...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>covid-19</td>\n", " <td>prediction_models_severity_index</td>\n", " <td>Severity Index models. Sourced from Yu Group a...</td>\n", " <td>severity_1-day, severity_2-day, severity_3-day...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>covid-19</td>\n", " <td>tableau_covid_datahub</td>\n", " <td>COVID-19 data that has been gathered and unifi...</td>\n", " <td>country_short_name, country_alpha_3_code, coun...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>covid-19</td>\n", " <td>tableau_jhu</td>\n", " <td>Johns Hopkins University data on COVID-19 case...</td>\n", " <td>case_type, cases, difference, date, country_re...</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>covid-19</td>\n", " <td>us_state_abbreviations</td>\n", " <td>Lookup table for US state abbreviations</td>\n", " <td>state, abbreviation</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>covid-19</td>\n", " <td>world_cases_deaths_testing</td>\n", " <td>Data on confirmed cases, deaths, and testing. ...</td>\n", " <td>iso_code, location, date, total_cases, new_cas...</td>\n", " <td></td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Database Table \\\n", "0 covid-19 alleninstitute_comprehend_medical \n", "1 covid-19 alleninstitute_metadata \n", "2 covid-19 country_codes \n", "3 covid-19 county_populations \n", "4 covid-19 covid_knowledge_graph_edges \n", "5 covid-19 covid_knowledge_graph_nodes_author \n", "6 covid-19 covid_knowledge_graph_nodes_concept \n", "7 covid-19 covid_knowledge_graph_nodes_institution \n", "8 covid-19 covid_knowledge_graph_nodes_paper \n", "9 covid-19 covid_knowledge_graph_nodes_topic \n", "10 covid-19 covid_testing_states_daily \n", "11 covid-19 covid_testing_us_daily \n", "12 covid-19 covid_testing_us_total \n", "13 covid-19 covidcast_data \n", "14 covid-19 covidcast_metadata \n", "15 covid-19 enigma_jhu \n", "16 covid-19 enigma_jhu_timeseries \n", "17 covid-19 hospital_beds \n", "18 covid-19 nytimes_counties \n", "19 covid-19 nytimes_states \n", "20 covid-19 prediction_models_county_predictions \n", "21 covid-19 prediction_models_severity_index \n", "22 covid-19 tableau_covid_datahub \n", "23 covid-19 tableau_jhu \n", "24 covid-19 us_state_abbreviations \n", "25 covid-19 world_cases_deaths_testing \n", "\n", " Description \\\n", "0 Comprehend Medical results run against Allen I... \n", "1 Metadata on papers pulled from the Allen Insti... \n", "2 Lookup table for country codes \n", "3 Lookup table for population for each county ba... \n", "4 AWS Knowledge Graph for COVID-19 data \n", "5 AWS Knowledge Graph for COVID-19 data \n", "6 AWS Knowledge Graph for COVID-19 data \n", "7 AWS Knowledge Graph for COVID-19 data \n", "8 AWS Knowledge Graph for COVID-19 data \n", "9 AWS Knowledge Graph for COVID-19 data \n", "10 USA total test daily trend by state. Sourced ... \n", "11 USA total test daily trend. Sourced from covi... \n", "12 USA total tests. Sourced from covidtracking.c... \n", "13 CMU Delphi's COVID-19 Surveillance Data \n", "14 CMU Delphi's COVID-19 Surveillance Metadata \n", "15 Johns Hopkins University Consolidated data on ... \n", "16 Johns Hopkins University data on COVID-19 case... \n", "17 Data on hospital beds and their utilization in... \n", "18 Data on COVID-19 cases from NY Times at US cou... \n", "19 Data on COVID-19 cases from NY Times at US sta... \n", "20 County-level Predictions Data. Sourced from Yu... \n", "21 Severity Index models. Sourced from Yu Group a... \n", "22 COVID-19 data that has been gathered and unifi... \n", "23 Johns Hopkins University data on COVID-19 case... \n", "24 Lookup table for US state abbreviations \n", "25 Data on confirmed cases, deaths, and testing. ... \n", "\n", " Columns Partitions \n", "0 paper_id, date, dx_name, test_name, procedure_... \n", "1 cord_uid, sha, source_x, title, doi, pmcid, pu... \n", "2 country, alpha-2 code, alpha-3 code, numeric c... \n", "3 id, id2, county, state, population estimate 2018 \n", "4 id, label, from, to, score \n", "5 id, label, first, last, full_name \n", "6 id, label, entity, concept \n", "7 id, label, institution, country, settlement \n", "8 id, label, doi, sha_code, publish_time, source... \n", "9 id, label, topic, topic_num \n", "10 date, state, positive, negative, pending, hosp... \n", "11 date, states, positive, negative, posneg, pend... \n", "12 positive, negative, posneg, hospitalized, deat... \n", "13 data_source, signal, geo_type, time_value, geo... \n", "14 data_source, signal, time_type, geo_type, min_... \n", "15 fips, admin2, province_state, country_region, ... \n", "16 uid, fips, iso2, iso3, code3, admin2, latitude... \n", "17 objectid, hospital_name, hospital_type, hq_add... \n", "18 date, county, state, fips, cases, deaths \n", "19 date, state, fips, cases, deaths \n", "20 countyfips, countyname, statename, severity_co... \n", "21 severity_1-day, severity_2-day, severity_3-day... \n", "22 country_short_name, country_alpha_3_code, coun... \n", "23 case_type, cases, difference, date, country_re... \n", "24 state, abbreviation \n", "25 iso_code, location, date, total_cases, new_cas... " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.tables(database=\"covid-19\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Create data source of QuickSight<strong>\n", "Note: data source stores the connection information." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_data_source(\n", " name=\"covid-19\",\n", " workgroup=\"primary\",\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Database</th>\n", " <th>Table</th>\n", " <th>Description</th>\n", " <th>Columns</th>\n", " <th>Partitions</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>covid-19</td>\n", " <td>nytimes_counties</td>\n", " <td>Data on COVID-19 cases from NY Times at US cou...</td>\n", " <td>date, county, state, fips, cases, deaths</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>covid-19</td>\n", " <td>nytimes_states</td>\n", " <td>Data on COVID-19 cases from NY Times at US sta...</td>\n", " <td>date, state, fips, cases, deaths</td>\n", " <td></td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Database Table \\\n", "0 covid-19 nytimes_counties \n", "1 covid-19 nytimes_states \n", "\n", " Description \\\n", "0 Data on COVID-19 cases from NY Times at US cou... \n", "1 Data on COVID-19 cases from NY Times at US sta... \n", "\n", " Columns Partitions \n", "0 date, county, state, fips, cases, deaths \n", "1 date, state, fips, cases, deaths " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.catalog.tables(database=\"covid-19\", name_contains=\"nyt\")" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>date</th>\n", " <th>county</th>\n", " <th>state</th>\n", " <th>fips</th>\n", " <th>cases</th>\n", " <th>deaths</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2020-01-21</td>\n", " <td>Snohomish</td>\n", " <td>Washington</td>\n", " <td>53061</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2020-01-22</td>\n", " <td>Snohomish</td>\n", " <td>Washington</td>\n", " <td>53061</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2020-01-23</td>\n", " <td>Snohomish</td>\n", " <td>Washington</td>\n", " <td>53061</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>2020-01-24</td>\n", " <td>Cook</td>\n", " <td>Illinois</td>\n", " <td>17031</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>2020-01-24</td>\n", " <td>Snohomish</td>\n", " <td>Washington</td>\n", " <td>53061</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>2020-01-25</td>\n", " <td>Orange</td>\n", " <td>California</td>\n", " <td>06059</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>2020-01-25</td>\n", " <td>Cook</td>\n", " <td>Illinois</td>\n", " <td>17031</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>2020-01-25</td>\n", " <td>Snohomish</td>\n", " <td>Washington</td>\n", " <td>53061</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>2020-01-26</td>\n", " <td>Maricopa</td>\n", " <td>Arizona</td>\n", " <td>04013</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>2020-01-26</td>\n", " <td>Los Angeles</td>\n", " <td>California</td>\n", " <td>06037</td>\n", " <td>1</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " date county state fips cases deaths\n", "0 2020-01-21 Snohomish Washington 53061 1 0\n", "1 2020-01-22 Snohomish Washington 53061 1 0\n", "2 2020-01-23 Snohomish Washington 53061 1 0\n", "3 2020-01-24 Cook Illinois 17031 1 0\n", "4 2020-01-24 Snohomish Washington 53061 1 0\n", "5 2020-01-25 Orange California 06059 1 0\n", "6 2020-01-25 Cook Illinois 17031 1 0\n", "7 2020-01-25 Snohomish Washington 53061 1 0\n", "8 2020-01-26 Maricopa Arizona 04013 1 0\n", "9 2020-01-26 Los Angeles California 06037 1 0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.athena.read_sql_query(\"SELECT * FROM nytimes_counties limit 10\", database=\"covid-19\", ctas_approach=False)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "pycharm": { "is_executing": false } }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>date</th>\n", " <th>county</th>\n", " <th>state</th>\n", " <th>fips</th>\n", " <th>confirmed</th>\n", " <th>deaths</th>\n", " <th>population</th>\n", " <th>county2</th>\n", " <th>Hospital</th>\n", " <th>hospital_fips</th>\n", " <th>licensed_beds</th>\n", " <th>staffed_beds</th>\n", " <th>icu_beds</th>\n", " <th>bed_utilization</th>\n", " <th>potential_increase_bed_capacity</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>2020-04-12</td>\n", " <td>Park</td>\n", " <td>Montana</td>\n", " <td>30067</td>\n", " <td>7</td>\n", " <td>0</td>\n", " <td>16736</td>\n", " <td>Park</td>\n", " <td>0</td>\n", " <td>30067</td>\n", " <td>25</td>\n", " <td>25</td>\n", " <td>4</td>\n", " <td>0.432548</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>2020-04-12</td>\n", " <td>Ravalli</td>\n", " <td>Montana</td>\n", " <td>30081</td>\n", " <td>3</td>\n", " <td>0</td>\n", " <td>43172</td>\n", " <td>Ravalli</td>\n", " <td>0</td>\n", " <td>30081</td>\n", " <td>25</td>\n", " <td>25</td>\n", " <td>5</td>\n", " <td>0.567781</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>2020-04-12</td>\n", " <td>Silver Bow</td>\n", " <td>Montana</td>\n", " <td>30093</td>\n", " <td>11</td>\n", " <td>0</td>\n", " <td>34993</td>\n", " <td>Silver Bow</td>\n", " <td>0</td>\n", " <td>30093</td>\n", " <td>98</td>\n", " <td>71</td>\n", " <td>11</td>\n", " <td>0.551457</td>\n", " <td>27</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>2020-04-12</td>\n", " <td>Clay</td>\n", " <td>Nebraska</td>\n", " <td>31035</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>6214</td>\n", " <td>Clay</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>NaN</td>\n", " <td>&lt;NA&gt;</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>2020-04-12</td>\n", " <td>Cuming</td>\n", " <td>Nebraska</td>\n", " <td>31039</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>8940</td>\n", " <td>Cuming</td>\n", " <td>0</td>\n", " <td>31039</td>\n", " <td>25</td>\n", " <td>25</td>\n", " <td>4</td>\n", " <td>0.204493</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>227684</th>\n", " <td>2020-06-11</td>\n", " <td>Hockley</td>\n", " <td>Texas</td>\n", " <td>48219</td>\n", " <td>28</td>\n", " <td>1</td>\n", " <td>22980</td>\n", " <td>Hockley</td>\n", " <td>0</td>\n", " <td>48219</td>\n", " <td>48</td>\n", " <td>48</td>\n", " <td>8</td>\n", " <td>0.120605</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>227685</th>\n", " <td>2020-06-11</td>\n", " <td>Hudspeth</td>\n", " <td>Texas</td>\n", " <td>48229</td>\n", " <td>11</td>\n", " <td>0</td>\n", " <td>4795</td>\n", " <td>Hudspeth</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>NaN</td>\n", " <td>&lt;NA&gt;</td>\n", " </tr>\n", " <tr>\n", " <th>227686</th>\n", " <td>2020-06-11</td>\n", " <td>Jones</td>\n", " <td>Texas</td>\n", " <td>48253</td>\n", " <td>633</td>\n", " <td>0</td>\n", " <td>19817</td>\n", " <td>Jones</td>\n", " <td>0</td>\n", " <td>48253</td>\n", " <td>45</td>\n", " <td>7</td>\n", " <td>1</td>\n", " <td>0.718591</td>\n", " <td>38</td>\n", " </tr>\n", " <tr>\n", " <th>227687</th>\n", " <td>2020-06-11</td>\n", " <td>La Salle</td>\n", " <td>Texas</td>\n", " <td>48283</td>\n", " <td>4</td>\n", " <td>0</td>\n", " <td>7531</td>\n", " <td>La Salle</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>&lt;NA&gt;</td>\n", " <td>NaN</td>\n", " <td>&lt;NA&gt;</td>\n", " </tr>\n", " <tr>\n", " <th>227688</th>\n", " <td>2020-06-11</td>\n", " <td>Limestone</td>\n", " <td>Texas</td>\n", " <td>48293</td>\n", " <td>36</td>\n", " <td>1</td>\n", " <td>23519</td>\n", " <td>Limestone</td>\n", " <td>0</td>\n", " <td>48293</td>\n", " <td>78</td>\n", " <td>69</td>\n", " <td>9</td>\n", " <td>0.163940</td>\n", " <td>9</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>227689 rows × 15 columns</p>\n", "</div>" ], "text/plain": [ " date county state fips confirmed deaths population \\\n", "0 2020-04-12 Park Montana 30067 7 0 16736 \n", "1 2020-04-12 Ravalli Montana 30081 3 0 43172 \n", "2 2020-04-12 Silver Bow Montana 30093 11 0 34993 \n", "3 2020-04-12 Clay Nebraska 31035 2 0 6214 \n", "4 2020-04-12 Cuming Nebraska 31039 2 0 8940 \n", "... ... ... ... ... ... ... ... \n", "227684 2020-06-11 Hockley Texas 48219 28 1 22980 \n", "227685 2020-06-11 Hudspeth Texas 48229 11 0 4795 \n", "227686 2020-06-11 Jones Texas 48253 633 0 19817 \n", "227687 2020-06-11 La Salle Texas 48283 4 0 7531 \n", "227688 2020-06-11 Limestone Texas 48293 36 1 23519 \n", "\n", " county2 Hospital hospital_fips licensed_beds staffed_beds \\\n", "0 Park 0 30067 25 25 \n", "1 Ravalli 0 30081 25 25 \n", "2 Silver Bow 0 30093 98 71 \n", "3 Clay <NA> <NA> <NA> <NA> \n", "4 Cuming 0 31039 25 25 \n", "... ... ... ... ... ... \n", "227684 Hockley 0 48219 48 48 \n", "227685 Hudspeth <NA> <NA> <NA> <NA> \n", "227686 Jones 0 48253 45 7 \n", "227687 La Salle <NA> <NA> <NA> <NA> \n", "227688 Limestone 0 48293 78 69 \n", "\n", " icu_beds bed_utilization potential_increase_bed_capacity \n", "0 4 0.432548 0 \n", "1 5 0.567781 0 \n", "2 11 0.551457 27 \n", "3 <NA> NaN <NA> \n", "4 4 0.204493 0 \n", "... ... ... ... \n", "227684 8 0.120605 0 \n", "227685 <NA> NaN <NA> \n", "227686 1 0.718591 38 \n", "227687 <NA> NaN <NA> \n", "227688 9 0.163940 9 \n", "\n", "[227689 rows x 15 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", "SELECT\n", " j.*,\n", " co.Population,\n", " co.county AS county2,\n", " hb.*\n", "FROM\n", " (\n", " SELECT\n", " date,\n", " county,\n", " state,\n", " fips,\n", " cases as confirmed,\n", " deaths\n", " FROM \"covid-19\".nytimes_counties\n", " ) j\n", " LEFT OUTER JOIN (\n", " SELECT\n", " DISTINCT county,\n", " state,\n", " \"population estimate 2018\" AS Population\n", " FROM\n", " \"covid-19\".county_populations\n", " WHERE\n", " state IN (\n", " SELECT\n", " DISTINCT state\n", " FROM\n", " \"covid-19\".nytimes_counties\n", " )\n", " AND county IN (\n", " SELECT\n", " DISTINCT county as county\n", " FROM \"covid-19\".nytimes_counties\n", " )\n", " ) co ON co.county = j.county\n", " AND co.state = j.state\n", " LEFT OUTER JOIN (\n", " SELECT\n", " count(objectid) as Hospital,\n", " fips as hospital_fips,\n", " sum(num_licensed_beds) as licensed_beds,\n", " sum(num_staffed_beds) as staffed_beds,\n", " sum(num_icu_beds) as icu_beds,\n", " avg(bed_utilization) as bed_utilization,\n", " sum(\n", " potential_increase_in_bed_capac\n", " ) as potential_increase_bed_capacity\n", " FROM \"covid-19\".hospital_beds\n", " WHERE\n", " fips in (\n", " SELECT\n", " DISTINCT fips\n", " FROM\n", " \"covid-19\".nytimes_counties\n", " )\n", " GROUP BY\n", " 2\n", " ) hb ON hb.hospital_fips = j.fips\n", "\"\"\"\n", "\n", "wr.athena.read_sql_query(sql, database=\"covid-19\", ctas_approach=False)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Create Dataset with custom SQL option<strong>" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_dataset(\n", " name=\"covid19-nytimes-usa\",\n", " sql=sql,\n", " sql_name=\"CustomSQL\",\n", " data_source_name=\"covid-19\",\n", " import_mode=\"SPICE\",\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ingestion_id = wr.quicksight.create_ingestion(\"covid19-nytimes-usa\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Wait ingestion<strong>" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "while wr.quicksight.describe_ingestion(ingestion_id=ingestion_id, dataset_name=\"covid19-nytimes-usa\")[\n", " \"IngestionStatus\"\n", "] not in [\"COMPLETED\", \"FAILED\"]:\n", " sleep(1)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Describe last ingestion<strong>" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'RowsIngested': 227689, 'RowsDropped': 0}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wr.quicksight.describe_ingestion(ingestion_id=ingestion_id, dataset_name=\"covid19-nytimes-usa\")[\"RowInfo\"]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>List all ingestions<strong>" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'time': datetime.datetime(2020, 6, 12, 15, 13, 46, 996000, tzinfo=tzlocal()),\n", " 'source': 'MANUAL'},\n", " {'time': datetime.datetime(2020, 6, 12, 15, 13, 42, 344000, tzinfo=tzlocal()),\n", " 'source': 'MANUAL'}]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[\n", " {\"time\": user[\"CreatedTime\"], \"source\": user[\"RequestSource\"]}\n", " for user in wr.quicksight.list_ingestions(\"covid19-nytimes-usa\")\n", "]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Create new dataset from a table directly<strong>" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.create_athena_dataset(\n", " name=\"covid-19-tableau_jhu\",\n", " table=\"tableau_jhu\",\n", " data_source_name=\"covid-19\",\n", " database=\"covid-19\",\n", " import_mode=\"DIRECT_QUERY\",\n", " rename_columns={\"cases\": \"Count_of_Cases\", \"combined_key\": \"County\"},\n", " cast_columns_types={\"Count_of_Cases\": \"INTEGER\"},\n", " tag_columns={\"combined_key\": [{\"ColumnGeographicRole\": \"COUNTY\"}]},\n", " allowed_to_manage={\"users\": [\"dev\"]},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "<strong>Cleaning up<strong>" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "wr.quicksight.delete_data_source(\"covid-19\")\n", "wr.quicksight.delete_dataset(\"covid19-nytimes-usa\")\n", "wr.quicksight.delete_dataset(\"covid-19-tableau_jhu\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.14", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.14" }, "pycharm": { "stem_cell": { "cell_type": "raw", "metadata": { "collapsed": false }, "source": [] } } }, "nbformat": 4, "nbformat_minor": 4 }